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Abstract — The paper presents an algorithm of 2.5D X-clock tree synthesis based on the stacked-layer combination of 
voltage islands for reducing both power consumption and clock delay. Double via insertion is also considered for via -effect 
avoidance and reliability. The algorithm can reduces the complexity of 3D clock tree construction of a stacked-layer chip. A 
clock network is first partitioned into the number of voltage islands distributed on each layer, such as L-type and T-type, and 
the X-clock tree is constructed for each voltage island. Then, we integrate these X-clock trees based on a well-defined 
connection each layer by inserting level shifters and TSVs for minimizing the power with the best trade off in power and 
delay. Experimental results show that our approach can save up to 10.94% and 35.185% effectively on average in power and 
delay, respectively.. 
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I. Introduction 

For the current nanometer process technology, a SoC (system-on a chip) integrates a number of different functional modules 
and usually has multimode operations for different set of modules that work at different time periods. If all the modules are 
supported with a uniform supplying voltage, the power always consumed the same during all the working time. To save the 
power, the voltage -island design methodology [1] assigns multiple supply voltages (MSV) to the functional modules of a 
SoC. The performance -critical modules are assigned the highest supplying voltage for keeping the high speed requirement 
with paying more power consumption. Other noncritical -based modules can operate at different lower voltages for no 
requiring the speed by paying less power. Thus, the power and speed can be trade off in a SoC. 

The 3D stacked IC technology has replaced the design of 2D SoC. The imaginary advantages of adopting 3D technology are 
to shorten their interconnections with inserting many TSVs (Through-silicon via) and to promote the chip performance [2]. 
For a 3D IC, the MSV design can support multiple voltages that form a number of voltage islands for managing the usage of 
power consumption. The clock network in the voltage islands of a 3D IC is thus more complex and difficult. The problem of 
a 3D clock tree construction with the minimization of power consumption and clock delay will be challenge. 

Relating voltage-island various works were addressed for 2D or 3D SoC designs. Lee et al. [3] proposed the voltage -island 
partitioning and floorplanning under the control for timing constraints. Dong and Goto [4] presented the floorplanning 
approach based on the multi-voltage and level-shifter driven. A global routing for multi-voltage islands based on power- 
driven approach [5] was proposed for the evaluation of power reduction. Lee et al. [6] proposed the voltage-island-based 
floorplanning with considering the optimization of reducing power consumption and temperature. 

In addition, many literatures [7,8,9] were concentrated on single voltage island for clock tree construction. Tsai et al. [10] 
proposed a clock tree construction on multimode multivoltage islands. With the binary clustering approach, they inserted 
buffers and adjusted their locations to minimize the clock delay and clock skew for matching different operated modes. Lin 
et al. [11] improved the above approach by replacing some inserted buffers with adjustable delay buffers (ADBs) for the 
well-defined control to the clock delay under the boundary skew. Kim [12] expanded the clock tree synthesis to a 3D stacked 
IC. With the requirement of zero skew, the clock tree construction depends on the trade-off of TSVs and total wire length. 
The deferred layer embedding (DLE) is used for reducing the number of TSVs while the deferred merge embedding is 
employed for minimizing the total wire length. Chen et al. [13] proposed the 3D-IC clock tree that constructs the clock tree 
on ASIC layer to associate with the pre-defined clock network on platform layer. The TSVs on ASIC layer projected from 
the platform layer were controlled for minimizing the clock delay and skew. Wang et al. [14] presented the prebond 
testability for two individual clock trees on respective upper and bottom layers and then combined them by inserting TSVs 
and TSV buffers to form a 3D clock tree. 

In this paper we propose an algorithm of 2.5D X-clock tree construction to apply on the stacked-layer multi voltage islands 
that is extended to the paper of [15]. The X-clock tree is individually constructed for a voltage island each layer and these 
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clock trees are combined with associating level shifters to get the best one for the trade off in power and delay under the 
control of zero skew. Then, these best X-clock trees located on different layers are integrated with a number of TSVs to form 
the X-clock tree of a stacked-layer chip. Experimentally, the complexity of 3D clock tree construction can be reduced 
effectively and the power and delay can be trade off. 

The rest of this paper is organized as follows. Section II shows the problem formulation related a clock tree construction. 
Section III introduces the estimation about the power consumption and interconnection delay of a clock tree. This includes 
the delay models of metal wire, TSV, double via, and level shifter. Section IV presents the proposed algorithm and their 
procedures in detail. Experimental results on benchmarks are reported in Section V. Finally, the conclusion and extension of 
this work are given in Section VI. 


II. Problem Formulation 


The clock network problem of a 3D stacked-layer chip can be explained as follows. Fig. 1 shows an example of two-layer 
clock network. As shown in Fig. 1(a), a stacked-layer chip consists of two layers with twelve clock sinks. Each layer contains 
three voltage islands and six clock sinks. Islands 1, 2, and 3 operate at different supplying voltages, 1.0 V, 1.1 V, and 1.2 V, 
respectively. Fig. 1(b) shows the straight approach that the 3D clock tree is directly constructed to connect twelve sinks 
located on three voltage islands of two stacked layers. Six level shifters are required for the interconnection from the low- 
voltage island (Island 1) to other two high-voltage islands (Island2 and Island3) and four TSVs are required to connect two 
stacked layers. 
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Fig. 1 (a) 3D clock network of two stacked layers, (b) 3D clock tree with six LSs and four TSVs, (c) 2.5D clock tree on 


upper layer (d) 2.5D clock tree on bottom layer, and (e) 2.5D clock tree with four LSs and one TSV. 


Alternatively, Figs. 1(c)- 1(e) show another divide-and-conquer approach to reduce the complexity of the previous 3D clock 
tree construction. Figs. 1(c) and 1(d) are two individual clock trees for upper and bottom layers, respectively and they just 
have two respective level shifters. Fig. 1(e) integrates these two clock trees of Figs. 1(c) and 1(d) to form a 2.5D stacked- 
layer clock tree with inserting a TSV. Compared with the 3D clock tree shown in Fig. 1(b), the 2.5D clock tree in Fig. 1(e) 
can save two level shifters and three TSVs as well as the reduction of power and delay. 


Thus, the problem of clock tree construction on multiple voltage islands of a stacked -layer chip can be defined as follows. 


Given a set of clock sinks on a set of multivoltage islands of a stacked-layer chip, the objective is to construct a 
zero- skew X-clock tree for the best trade off in power and delay. 


Here, we employ the later divide-and-conquer approach to construct a 2.5D stacked-layer clock tree on the combination of 
multiple voltage islands for reducing the power and delay. 


III. The Estimation of Power and Delay 

3.1 Power Estimation 

Evaluating the power consumption of a clock tree is related on three major factors, supplying voltage V dd , working frequency 
F cik , and loading capacitance Ci oad . The loading capacitance includes all the equivalent wire capacitance of interconnections, 
the input capacitance of all the inserted level shifters, and the input capacitance of all the clock sinks. Thus, the total power 
consumption P totai is formulated as below. 

^ total — ^ CloadJ F dk Vdd (1) 
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where Ci oad>i the capacitance of the sink i (or node /) and e t is defined as the set of clock tree edges those are along the path 
from the root of clock tree to the sink i. From (1), lower V dd lower power consumption. The supplying voltage V dd depends its 
voltage islandl (1.0V), island2 (1.1V), or island3 (1.2V). The total power consumption P tota i is associated with all the 
dissipated power of voltage islands. 

3.2 Wire Delay Model and Estimation 

The clock delay is defined as the maximal wire delay from the clock source to all the clock sinks of a clock tree. The wire 
delay of a clock tree is calculated by employing the fitted Elmore delay (FED) model [16]. A wire j with the width w 7 and 
length //based on the FED model is shown in Fig. 2, where r, c a , and c^are the sheet resistance, unit area capacitance, and 
fringing capacitance, respectively. The delay of the wire j with a loading capacitance C L>i at the sink i is formulated as below. 

Delayi = (r/ ; / Wj )[o.5(Dc a w y . + Ec f )/ ; + FC Li ] (2) 

where coefficients D, E, and F are obtained by using the curve fitting techniques [16]. 

A TSV (Through- silicon via) can shorten the interconnection between 3D stacked layers. The delay model of a TSV can be 
viewed as the equivalent-RC model. Similarly, for the delay of a TSV with a loading capacitance C Lyi at the sink i is referred 
as below. 


Delay } — r TSV [0.5 Dc TSV +FC Li ] (3) 

where r TSV and c TS y are the resistance and capacitance of a TSV. 

Double via insertion has the advantages of via-effect avoidance and reliability. For a double via, the inserted redundant via is 
always parallel to the single via. Hence, the resistance and capacitance of a double via are half and double of a single via, 
respectively. Fig. 3 shows the equivalent circuit of a double via, where k is two [17]. The delay calculation is referred the 
same to (2). 
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Fig. 2 The equivalent circuit of a wire j. 
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Fig. 3 The equivalent circuit of a double via. 


3.3 Level Sifter Model and Delay Estimation 

A level shifter (LS) is used to insert into the interface from a low-voltage island to a high-voltage island. As reported in [1], a 
level shifter can consume the power and affect the delay. Fig. 4 shows that the equivalent circuit of a level shifter contains 
the intrinsic delay T LS , input capacitance c LS , and output resistance r LS . 
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Fig. 4 The equivalent circuit of a level shifter. Fig. 5 The equivalent circuit of a level shifter drives a capacitive load. 


A level shifter drives the wire j with a loading capacitance C Lji shown in Fig. 5, the delay is formulated as below. 

Delay, =T LS +(r LS +rlj / Wj)[o.5(Dc a Wj + Ec f )lj + FC Li ] (4) 

where coefficients D, E, and F and the width wj and length lj of a wire j are based on the FED model. 
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IV. 2.5D X-clock Tree Synthesis based on Voltage-Island Combination 
4.1 The Proposed Algorithm 

Solving the problem of a 3D clock tree construction with different voltage islands is complex and difficult. To simplify its 
complexity, we adopt the divide-and-conquer approach called a 2.5D stacked-layer clock tree construction based on the 
combination of multiple voltage islands for reducing the power and delay. For each stacked layer with different voltage 
island, we construct the X-clock tree for each voltage island and then integrate these X-clock trees based on a well-defined 
connection by inserting level shifters for minimizing the power and delay. Meanwhile, we have a number of stacked -layer X- 
clock trees. Finally, we integrate these X-clock trees by inserting few TSVs for obtaining the best trade off in power and 
delay. 

Our clock tree construction considers X-architecture based on the advanced lithography technologies, metal wires in a chip 
can be routed with arbitrary angles, especially for diagonal (±45°) wires assigned with metal layers 3 and 4. X-architecture 
combines diagonal, horizontal, and vertical wires to respectively achieve improvements of 10%, 20%, 30%, and 20% in chip 
performance, power consumption, die cost, and wirelength compared with Manhattan-architecture [17]. 

Moreover, the redundant-via insertion (RVI) is also included in our X-clock tree construction for improving yield and 
reliability. The RVI is also called double via insertion (DVI) that is a well-known and effective method highly recommended 
by semiconductor foundries for reducing failed vias. 

Fig. 6 shows the proposed algorithm of 2.5D X-clock tree construction based on the stacked-layer combination of voltage 
islands, called 2.5D-MuVIX-DVI. In the algorithm, the input is a set of clock sinks on a set of voltage islands VI and a set of 
supplying voltages SV of a stacked-layer chip and the output is a zero -skew X-clock tree of a stacked-layer chip for the best 
trade off in power and delay. For each stacked layer j with a set of supplying voltages, SV, the supplying voltage denoted as 
SV LC s-j for the local clock source LCSj is first determined as below. 

SV ICSi = min SV k (5) 

\/vi k 

For each voltage island vi k e VI, vi k can operate at several supplying voltages SV k = {viq, sv 2 , ...}. In this work, we set the 
lowest supplying voltage of all the voltage islands on the layer j as the SV LS c-j for minimizing the power consumption, but 
some level shifters should be required for the interfaces of low-to-high voltage islands. 


Algorithm: 2.5D MuVIX-DVI 

Input: A set of clock sinks on a set of voltage islands VI and a set of supplying voltages SV of a 
stacked-layer chip. 

Output: A zero-skew X-clock tree of a stacked-layer chip for best trade off in power and delay. 

{ For each stacked-layer j 

{ Determine the supplying voltage SV lcs _j of local clock source LCSj. 

PMXF(VI); // Construct X-clock tree for each voltage island e VI. 

DVI-X; II Insert double via into X-clock tree for each voltage island e VI. 

Let each constructed X-clock tree be a leaf-node and obtain the connection sequences 
CS(VI) based on SV LCS _j. 

For each voltage island vi e CS(VI) 

{ Obtain the connection sequences CS(LS) for level-shifter insertion. 

Do make combination for each Is e CS(LS ) 

While (Power j is improved with considering reasonable delay) 

} 

} 

Integrate all the X-clock trees on each stacked layer by inserting a few TSVs with the 
trade off in power and delay and determine the system clock source. 

j 

Fig. 6 The proposed algorithm of 2.5D X-clock tree synthesis. 

Fig. 7(a) shows an example that a stacked layer has three voltage islands with different supplying voltages, that is, Islandl, 
Island2, and Island3. PMXF first constructs the X-clock subtree for each voltage island and then DVI-X inserts double via 
into the subtree. Fig. 7(b) shows the X-clock subtree of Island2 with the clock source CLK 2 . The clock source CLK 2 can be 
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viewed as a leaf-node, Leaf-node 2 , and present its supplying voltage as SV 2 = {svi, sv 2 , ...} shown in Fig. 7(c). Similarly, 
Leaf -node 3 and SV 3 respectively represent the clock source and supplying voltage of Island3. 



Fig. 7 (a) A stacked layer has three voltage islands and (b) PMXF and DVI-X construct the X-clock tree of Island2 

and (c) the X-clock tree can be viewed as a leaf node. 


Then, we integrate these leaf nodes to get the best one for power minimization. All the connecting sequences with different 
voltage islands, CS(V7), can be combined as possible. For a connecting sequence v z e CSiVJ) associated with the SV LC s-j of 
these islands, level shifters are required to insert into the interface of low-to-high voltage islands. After that, we estimate the 
power consumption for each connecting sequence Is e CS(LS ), where CS(LS ) is the all combination of connecting sequences 
with level-shifter insertion. Thus, we can get a multivoltage-island-based X-clock tree with the well-defined connecting 
sequence for the minimum power on each layer. 


4.2 The Procedure of PMXF 


The procedure PMXF [ 8 ] is employed to construct the X-clock tree for each voltage island belonging to a set of voltage 
islands, VI. The PMXF can construct an X-architecture zero-skew clock tree with minimum delay. Some strategies were 
adopted such as: an X-pattem library is defined for simplifying the merging procedure of DME approach, an X-Flip 
technique is used for reducing the wirelength between the paired points, and a wire sizing technique is applied for achieving 
zero skew. The detailed explanation for the PMXF procedure is referred in [ 8 ]. 

Fig. 8 shows the process that PMXF constructs an X-architecture zero-skew clock tree for the given eight sinks. First, sink s 4 
is taken and determined to connect the sink s 6 with the shortest X-architecture distance. Then, sink s 2 is matched with the 
sink si, sink £3 is paired with the sink s 5 , and sink Sg is connected with the sink s 7 for their shortest X-architecture distances. 
Thus, their tapping point s 9 , su, s n , and Siq with respective zero-skew ratio v (0 < v < 1) are determined. By recursively using 
the above procedures to process each pair of sinks, we can complete the X-architecture zero-skew clock routing. 



Fig. 8 An eight-sink X-architecture zero -skew clock tree is constructed by using PMXF procedure. 


4.3 The Procedure of DVI-X 


Another procedure DVI-X [9] is applied in the post-stage of an X-clock tree construction for double-via insertion rate 
improvement, yield, and reliability. The DVI-X first constructs the bipartite graph of the partitioned X-clock tree with 
original single vias and candidate redundant vias and then, applies the augmenting path approach associated with the 
construction of the maximal cliques to obtain the matching solution from the bipartite graph. Moreover, a skew tuning 
technique is further applied to achieve zero skew due to the inserted double via may affect the clock skew. The detailed 
explanation for the DVI-X procedure is referred in [9] . 

Fig. 9 shows an example that DVI-X inserts six double vias for a partial X-clock tree with six single vias. That is, each single 
via has at least a double via insertion. For instance, as shown in Fig. 9(b), the single via v\ has up to eight positions for 
redundant- via candidates (RVCs) that can possibly insert at least one redundant via. When considering a redundant via 
insertion, we should follow the via width and rule space to avoid creating any design -rule violation. Obviously, there are up 
to six positions (six infeasible RVCs in Fig. 9(b)) of eight positions that are not suitable for inserting double via. There is 
just one of two positions, Vi ?RT and Vi LR (two RVCs in Fig. 9(b)), that can be inserted a redundant via. Associating with the 
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minimal effect for the redundant via insertion of an adjacent single via v 2 , here the Vi RT position is determined for inserting a 
double via for the single via Vi. 




Infeasible 

RVC 


Metal 3 
| Metal 2 
1 Metal 1 


Fig. 9 (a) An example of X-clock partial layout inserted six double vias and (b) one of eight possible positions of RVCs 

is suitable for inserting a double via. 


4.4 The X-Clock Tree Combination of Voltage Islands 


After constructing the X-clock tree for each voltage island located on each stacked layer with different voltages, the above 
constructed X-clock trees for each voltage island can be viewed as leaf nodes for integration. Fig. 10 shows an illustrated 
example that is used for the explanation of all the integrated methods. For the three voltage islands shown in the figure, they 
are labelled as Leaf-node i for Islandl, Leaf-node 2 for Island2, and Leaf-node 3 for Island3 with three supplying voltages 1.0V, 
1.1V, and 1.2V, respectively, that are SV i = 1.0V, SV 2 = 1.1V, and SV 3 = 1.2V. Hence, there are up to six combinations (i.e., 
3!) in connecting sequences denoted as CSfVI) = {vq, vi 2 , W 3 , v/ 4 , W 5 , v/ 6 } for the minimization of power and delay. For 
example, for vi\ e CS(VI) as shown in Fig. 10(b), Leaf-node i and Leaf-node 2 are connected first and then they are associated 
with Leaf-node 3 to complete the voltage-island-based X-clock tree. That is, the first one vi\ presents the connecting sequence 
of vz'i = [Leaf-node u Leaf-node 2 , Leaf-node 3 }. Fig. 10(c) shows the other connecting sequence of vi 6 e CS(VI). The sixth one 
vie conducts the connecting sequence of vi 6 = { Leaf-node 3 , Leaf-node 2 , Leaf -node i}. 



(a) (b) (c) 

Fig. 10 A stacked layer has three voltage islands, (a) each voltage-island based X-clock tree can be labelled as a leaf 
node, (b) the connecting sequence of the first one vi u and (c) the connecting sequence of the sixth one vi 6 . 


After determining the supplying voltage for the local clock SV LC s-j and the supplying voltage V dd for each island, such as 
SVLcs-j =1.0V, V dd \ =1.0V, V dd2 =1.1V, and V^ 3 =1.2V, we can integrate a voltage-island-based X-clock tree with combining 
three leaf-nodes and inserting the required level shifters. Fig. 11(a) shows the connecting sequence of the sixth one vi 6 = 
{ Leaf-node 3 , Leaf-node 2 , Leaf-node i} and the supplying voltage SV LC s-j for the local clock of vi 6 is 1.0V due to the 
consideration of reducing power. 


■sV=i.ov 


SV sy =1.0\ 



SV 3 = 1.2V SV 2 =1.1V 

vi 6 = { Leaf-node 3 ,Leaf-nocle 2 , Leaf-node , } 


SV 3 = 1.2V SV 2 = 1.1V 

vi 6 = { Leaf-node 3 ,Leaf-node 2 ,Leaf-node l } 


-SV 3 =1.2V ,SV 2 =1.1V 

vi 6 = { Leaf-node 3 ,Leaf-node 2 ,Leaf-node x } 


(a) (b) (c) 

Fig. 11 (a) The sixth one vi 6 connecting sequence of vi 6 = {Leaf -node 3 , Leaf -node 2 , Leaf-node i} with two inserted level 

shifters for the case of (b) or (c). 
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Figs. 11(b) and 11(c) show two connecting sequences for v/ 6 with different level-shifter insertion. For Fig. 11(b), LS X delivers 
the local clock source operating at 1.0V to Leaf-node 3 working at 1.2V and LS 2 converts the clock signal from operating at 
1.0V to Leaf-node 2 working at 1.1V. For Fig. 11(c), LS\ delivers the local clock source operating at 1.0V to Leaf-node 2 
working at 1.1V and LS 2 that converts the clock signal from operating at 1.1V to Leaf -node 3 working at 1.2V. Hence, each 
connecting sequence of multivoltage islands has at least one connecting sequence with level-shifter insertion. 

Therefore, we can obtain a number of X-clock trees derived from the above six different connecting sequences of {vq, vi 2 , W 3 , 
W 4 , v/ 5 , v/ 6 } for a stacked layer. Then, we select the best one of them that has the minimal power consumption with a 
reasonable delay. Given the best X-clock tree for each stacked layer, finally, we can integrate these X-clock trees by inserting 
few TSVs to form a system X-clock tree with the best trade off in power and delay. 

4.5 Time Complexity of Proposed Algorithm 

The time complexity of the proposed algorithm 2.5D-MuVIX-DVI is analyzed as below. PMXF [8] constructs the X-clock 
tree for each voltage island in 0(q log q ), where q is the part of a set of total sinks n. DVI-X [9] inserts double vias for each 
voltage island in 0(p 3 ), where p is the number of single vias. For each connecting sequence, it takes 0(m log m) to combine 
m leaf-nodes with inserted level shifters, where m « n, m « p , and p is proportional to q. Hence, the time complexity of 
2.5D-MuVIX-DVI algorithm is O (n log n) +0 (p 3 ). 


V. Experimental Results 

The proposed algorithm has been implemented by using C/C++ programming language and performed on a MS-Windows 
8.1 machine with Intel i7 CPU@2.2GHz, dual cores, and 8GB RAM. Table 1 lists the fabrication parameters of FED delay 
model [16], level shifter (LS) under 130nm process [18], and TSV model in 130nm technology [19]. The tested benchmarks 
contain IBM rl-r5 [7]. 


Table 1 

Technology parameters of FED delay model, a level shifter, and a TSV under 130nm. 


r (O/pm) 

0.623 

D 

1.126731n2 

r LS (^) 

250 

c TSV (fF/nm) 

15.48 

c a (fF/|xm) 

0.00598 

E 

1.104631n2 

C,s (fF) 

23.5 

r TSV (Q.) 

0.035 

Cf (fF/pm) 

0.043 

F 

1.048361n2 

Tls (ps) 

54.4 

F clk { Hz) 

100M 


For all the experiments, a benchmark that all the sinks are first randomly partitioned into two equalized sinks, upper- and 
bottom-layer group sinks. Each layer is then divided into two voltage islands (i.e., L-type) or three voltage islands (i.e., T- 
type). An X-clock tree is constructed for each partitioned voltage island using PMXF algorithm [8] and then DVI-X 
algorithm [9] is followed for double- via insertion with considering the skew tuning for skew minimization. After the X-clock 
tree construction with double via insertion for each partitioned voltage island, we expand the PMXF algorithm to integrate all 
the island-based sub-X-clock trees and level shifters are inserted if the clock signal is delivered from a low-voltage island to 
a high-voltage island. Thus, we have several different connections depending on a sequence of islands associated with 
different supplying voltages and level shifters. We can determine one of them that power consumption is minimized under 
the reasonable clock delay. Finally, we integrate two L-type (T-type) X-clock trees of upper and bottom layers to form a 
2.5D L-type (T-type) X-clock tree by a TSV connection for getting the best trade off in power and delay. 

Table 2 shows the results of 2.5D L-type and T-type voltage-island-based X-clock trees with considering double via insertion 
in terms of via, power consumption, clock delay, and CPU time. From the experiments, T-type consumes more inserted vias 
than that of L-type, but pays less in terms of power, delay, and running time than that of L-type. These experiments also 
depend on the partition of stacked layer for each benchmark and their clock sinks distributed on stacked layers. 


Table 2 

Results of 2.5D L-type and T-type voltage-island X-clock trees with considering double via 

INSERTION IN VIA, POWER, DELAY, AND CPU TIME. 




Total via 

Power (mW) 

Delay (ns) 

CPU time (s) | 

Benchmark 

#Sinks 

2.5D 

2.5D 

2.5D 

2.5D 

2.5D 

2.5D 

2.5D 

2.5D 



L-type 

T-type 

L-type 

T-type 

L-type 

T-type 

L-type 

T-type 

rl 

267 

2391 

2402 

66.697 

67.334 

217.8951 

229.845 

4.728 

5.024 

r2 

598 

5484 

5517 

171.224 

160.102 

776.581 

485.693 

16.563 

6.842 

r3 

862 

7962 

7746 

247.915 

240.616 

903.977 

821.331 

17.106 

11.256 

r4 

1903 

17886 

17529 

576.786 

566.135 

2354.949 

2041.169 

138.352 

83.762 

r5 

3101 

26617 

27924 

946.117 

885.935 

4408.646 

3523.985 

649.125 

315.378 
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Table 3 shows the comparison of 2.5D and 2D voltage-island L-types vs single voltage island with considering double via 
insertion. The 2.5D L-type can save up to 9.4% (i.e., 1-0.906) and 29.62% (i.e., 1-0.7038) on average in power and delay, 
respectively, than that of single voltage type [9], and reduce the delay of 21.54% (i.e., 0.9192-0.7038) than that of 2D L-type 
[20], but pays more power of 12.3% (i.e., 0.906-0.783) than that of 2D L-type. 


Table 3 

Comparison of 2.5D vs 2D L-type voltage-island (1.0V/1.2V) and single voltage-island (1.2V) X- 

CLOCK TREES WITH CONSIDERING DOUBLE VIA INSERTION IN POWER AND DELAY. 


Benchmark 

Power (mW) 

Delay (ns) 

Single 
island [9] 

2D L-type 
[20] 

2D L-type 
ratio 

2.5D 

L-type 

2.5D L- 
type ratio 

Single island 
[9] 

2D L-type 
[20] 

2D L-type 
ratio 

2.5D 

L-type 

2.5D L- 
type ratio 

rl 

80.237 

59.901 

0.7466 

66.697 

0.8312 

278.317 

284.837 

1.0234 

217.8951 

0.7829 

r2 

195.091 

147.617 

0.7567 

171.224 

0.8777 

858.636 

853.203 

0.9937 

776.581 

0.9044 

r3 

261.996 

211.568 

0.8075 

247.915 

0.9463 

1453.014 

1344.603 

0.9254 

903.977 

0.6221 

r4 

612.901 

491.303 

0.8016 

576.786 

0.9411 

4101.838 

3011.99 

0.7343 

2354.949 

0.5741 

r5 

1013.366 

813.438 

0.8027 

946.117 

0.9336 

6938.616 

6377.82 

0.9192 

4408.646 

0.6354 

Average 

- 

- 

0.7830 

- 

0.9060 

- 

- 

0.9192 

- 

0.7038 


Table 4 shows the comparison of 2.5D and 2D voltage-island T-types vs single voltage island with considering double via 
insertion. From the table, 2.5D T-type can save up to 12.48% (i.e., 1-0.8752) and 40.75% (i.e., 1-0.5925) on average in 
power and delay, respectively, than that of single voltage type [9], and reduce the delay of 35.63% (i.e., 0.9488-0.5925) than 
that of 2D L-type [20], but pays more power of 9.8% (i.e., 0.8752-0.7772) than that of 2D L-type. 


Table 4 

Comparison of 2.5D vs 2D T-type voltage-island (1.0V/1.1V/1.2V) and single voltage-island 
(1.2V) X-CLOCK TREES WITH CONSIDERING DOUBLE VIA INSERTION IN POWER AND DELAY. 



Power (mW) 

Delay (ns) 

Benchmark 

Single island 

2D T-type 

2D T-type 

2.5D 

2.5D T-type 

Single island 

2D T-type 

2D T-type 

2.5D 

2.5D T-type 


[9] 

[20] 

ratio 

i 

H 

ratio 

[9] 

[20] 

ratio 

T-type 

ratio 

rl 

80.237 

65.776 

0.8198 

67.334 

0.8392 

278.317 

285.819 

1.0270 

229.845 

0.8258 

r2 

195.091 

140.134 

0.7183 

160.102 

0.8207 

858.636 

823.275 

0.9588 

485.693 

0.5657 

r3 

261.996 

205.05 

0.7827 

240.616 

0.9184 

1453.014 

1378.38 

0.9486 

821.331 

0.5653 

r4 

612.901 

494.468 

0.8068 

566.135 

0.92369 

4101.838 

3624.769 

0.8837 

2041.169 

0.4976 

r5 

1013.366 

768.672 

0.7585 

885.935 

0.8742 

6938.616 

6424.236 

0.9259 

3523.985 

0.5079 

Average 

- 

- 

0.7772 

- 

0.8752 

- 

- 

0.9488 

- 

0.5925 


From Tables 3 and 4, the 2.5D voltage-island L- and T- types can reduce the power and delay up to 10.94% (i.e., 
(9.4%+12.48%)/2) and 35.185% (i.e., (29.62%+40.75%)/2)on average, respectively, than that of single voltage type [9]. 


Fig. 12 presents 2.5D X-clock trees of the benchmark r5 that are based on L-type two voltage islands for the cases of 12(a) 
upper layer, 12(b) bottom layer, and 12(c) combination of two stacked layers. Fig. 13 shows 2.5D X-clock trees of the 
benchmark r5 that are based on T-type three voltage islands for the cases of 13(a) upper layer, 13(b) bottom layer, and 13(c) 
combination of two stacked layers. 


o«m 145H0) 





(a) (b) (c) 

Fig. 12 2.5D X-clock trees of the benchmark r5 that are based on L-type two voltage islands for the cases of (a) upper 

layer, (b) bottom layer, and (c) combination of two stacked layers. 
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(a) (b) (c) 

Fig. 13 2.5D X-clock trees of the benchmark r5 that are based on T-type three voltage islands for the cases of (a) 
upper layer, (b) bottom layer, and (c) combination of two stacked layers. 


VI. Conclusion 

The 2.5D X-clock tree construction with considering double via insertion based on the multivoltage island combination of a 
stacked layer has been successfully implemented. The power consumption and clock delay can be dramatically reduced. 
Current version has the modes of multivoltage islands like L-type and T-type on two stacked layers. Future works can be 
expanded to the multiple modes of multi-stacked layer multi-type for X-clock tree construction. 


Acknowledgements 

The author would like to thank these institutes for financially supporting this research work under the contract numbers of 
MOST 103-2221-E-343-005, NHU-104 research project, and MOST 105-2221-E-343-004. 


References 

[1] W. K. Mak and J. W. Chen, “Voltage island generation under performance requirement for SoC designs,” IEEE Design Automation 
Conference in Asia and South Pacific , pp. 798-803, Jan. 2007. 

[2] EE Times, The State of the Art in 3D IC Technologies, Nov. 27, 2013. 

[3] W.-P. Lee, H.-Y. Liu, and Y.-W. Chang, “Voltage-island partitioning and floorplanning under timing constraints,” IEEE Trans, on 
CAD of Integrated Circuits and Systems , vol. 28, no. 5, pp. 690-702, May 2009. 

[4] B. Yu, S. Dong, and S. GOTO, “Multi-voltage and level-shifter assignment driven floorplanning,” IEEE The 5th International 
Conference on ASIC, pp. 1264-1267, Oct. 2009. 

[5] Tai-Hsuan Wu, A. Davood, and J. T. Linderoth, “Power-driven global routing for multi-supply voltage domains,” Design, Automation 
and Test in Europe , march 2011. 

[6] Byunghoon Lee, Eui- Young Chung, and Hyuk-Jun Lee, " Voltage Islanding Technique for Concurrent Power and Temperature 
Optimization in 3D -stacked ICs," The 29th Int. Technical Conference on Circuit/Systems Computers and Communications (ITC- 
CSCC), pp. 267-269, July 2014. 

[7] R. S. Tsay, “Exact Zero Skew,” IEEE International Conference on Computer-Aided Design , pp. 336-339, 1991. 

[8] Chung-Chieh Kuo, Chia-Chun Tsai, and Trong-Yen Lee, “Pattem-matching-based X- Architecture zero-skew clock tree construction 
with X-Flip technique and via delay consideration,” Integration, the VLSI Journal, vol. 44, no. 1, pp. 87-101, Jan. 2011. 

[9] Chia-Chun Tsai, Chung-Chieh Kuo, and Trong-Yen Lee “Post-routing double- via insertion for X- Architecture clock tree yield 
Improvement,” IEICE Trans, on Fundamentals of Electronics, Communications and Computer Sciences, vol. E94-A, no.2, pp. 706- 
716, Feb. 2011. 

[10] C. C. Tsai, T. H. Lin, S. H. Tsai, and H. M. Chen, “Clock planning for multi-voltage and multi-mode designs,” IEEE International 
Symposium on Quality Electronic Design, pp. 654-658, Mar. 2011. 

[11] Kuan-Yu Lin, Hong-Ting Lin, Tsung-Yi Ho, and Chia-Chun Tsai, “Load-balanced clock tree synthesis with adjustable delay buffer 
insertion for clock skew reduction in multiple dynamic supply Voltage designs,” ACM Trans, on Design Automation of Electronic 
Systems, vol. 17, no. 3, Article 34, 2012. 

[12] T. Y. Kim and T. Kim, “Clock tree synthesis for TSV-based 3D IC designs,” ACM Transactions on Design Automation of Electronic 
Systems, vol. 16, no. 4, Article 48, 2011. 

[13] F. W. Chen and T. T. Hwang, “Clock tree synthesis with methodology of re-use in 3D IC,” IEEE/ACM Design Automation 
Conference, pp. 1094-1099, 2012. 

[14] S. J. Wang, C. H. Lin, and Katherine S. M. Li, “Synthesis of 3D Clock Tree with Pre-bond Testability,” IEEE International 
Symposium on Circuits and Systems, pp. 2654-2657, 2013. 

[15] Chia-Chun Tsai, “2.5D X-Clock Tree Construction Based on Stacked-Layer Combination of Multivoltage Islands,” IEEE The Third 
International Symposium on Computer, Consumer and Control, pp. 443-446, July 4-6, 2016. 

[16] A. I. AbouSeido, B. Nowak, and C. Chu, “Fitted Elmore delay: a simple and accurate interconnect delay model,” IEEE Transactions 
on Very Large Scale Integration Systems, vol. 12, no. 7, pp. 691-696, 2004. 


Page | 22 


International Journal of Engineering Research & Science (IJOER) 


ISSN: [2395-6992] 


[Vol-3, Issue-2, February- 2017] 


[17] T. Y. Ho, Y. W. Chang, S. J. Chen, and D. T. Lee, “Crosstalk- and performance-driven multilevel full-chip routing,” IEEE Trans, on 
CAD of Integrated Circuits and Systems , vol. 24, no. 6, pp. 869-878, 2005. 

[18] T. C. Chen, S. R. Pan, and Y. W. Chang, “Timing modeling and optimization under the transmission line model,” IEEE Transactions 
on Very Large Scale Integration Systems , vol. 12, no. 1, pp. 28-41, 2004. 

[19] P. Sarkar and C. K. Koh, “Routability-driven repeater block planning for interconnect-centric floorplanning,” IEEE Trans, on CAD of 
Integrated Circuits and Systems , pp. 660-671, 2001. 

[20] Chia-Chun Tsai and Trong-Yen Lee, “Power awareness for multi-voltage island X-clock tree construction with double- via insertion,” 
The 4th Asia Symposium on Quality Electronic Design , pp. 187-192, July 10-11, 2012. 


Page | 23 




